A
a+ file open mode

B
binary digits

bit

byte

C
character set

characters

D
data hierarchy

database

database management system (DBMS)

decimal digits

E
end-of-file indicator

F
feof

fgetc

field

file

file open mode

file pointer

file position pointer

FILE structures

fopen

fputc

fread

fseek

fwrite

O
open a file

R
r+ open file mode

randomly accessed file

record

record key

S
SEEK_CUR

SEEK_END

SEEK_SET

sequential access file

sizeof operator

standard error

standard input

standard output

W
w+ file open mode

The FILE structure is operating system dependent (i.e., the members of the structure vary among systems based on how each system handles its files).

I read part of it all the way through.
Samuel Goldwyn
Hats off!
The flag is passing by.
Henry Holcomb Bennett
Consciousness ... does not appear to itself chopped up in bits. ... A "river" or a "stream" are the metaphors by which it is most naturally described.
William James
I can only assume that a "Do Not File" document is filed in a "Do Not File" file.
Senator Frank Church

Senate Intelligence Subcommittee Hearing, 1975

Fig. 23.1 - The data hierarchy.
Fig. 23.2 - C's view of a file of n bytes.
Fig. 23.3 - Creating a sequential file.
Fig. 23.4 - End-of-file key combinations for various popular computer systems.
Fig. 23.5 - The relationship between FILE pointers, FILE structures and FCBs.
Fig. 23.6 - File open modes.
Fig. 23.7 - Reading and printing a sequential file.
Fig. 23.8 - Credit inquiry program.
Fig. 23.10 - View of a randomly accessed file with fixed-length records.
Fig. 23.11 - Creating a random access file sequentially.
Fig. 23.12 - Writing data randomly to a randomly accessed file.
Fig. 23.14 - The file position pointer indicating an offset of 5 bytes from the beginning of the file.
Fig. 23.15 - Reading a random access file sequentially.
Fig. 23.16 - Bank account program.

Figure 23.1 - The data hierarchy.

Figure 23.2 - C's view of a file of n bytes.

Figure 23.4 - End-of-file key combinations for various popular computer systems.

Figure 23.5 - The relationship between FILE pointers, FILE structures and FCBs (Part 1 of 2).

Figure 23.5 - The relationship between FILE pointers, FILE structures and FCBs (Part 2 of 2).

Figure 23.6 - File open modes.

Figure 23.10 - View of a randomly accessed file with fixed-length records.

Figure 23.14 - The file position pointer indicating an offset of 5 bytes from the beginning of the file.

Figure 23.3 - Creating a sequential file.

Figure 23.7 - Reading and printing a sequential file.

Figure 23.8 - Credit inquiry program.

Figure 23.11 - Creating a random access file sequentially.

Figure 23.12 - Writing data randomly to a randomly accessed file.

Figure 23.15 - Reading a random access file sequentially.

Figure 23.16 - Bank account program.

Be sure that calls to file processing functions in a program contain the correct file pointers.
Explicitly close each file as soon as it is known that the program will not reference the file again.
Open a file only for reading (and not update) if the contents of the file should not be modified. This prevents unintentional modification of the file's contents. This is another example of the
principle of least privilege.

Closing a file can free resources for which other users or programs may be waiting.
Many programmers mistakenly think sizeof is a function, and that using it generates the execution-time overhead of a function call. There is no such overhead because sizeof is a compile-time operator.

Answers 23.1
a) 1s, 0s. b) Bit. c) File. d) Characters. e) Database. f) fclose. g) fscanf.

h) getc or fgetc. i) fgets. j) fopen. k) fread. l) fseek.
Answers 23.2
a)False. Function fscanf can be used to read from the standard input by including the pointer to the standard input stream, stdin, in the call to fscanf.
b)False. These three streams are opened automatically by C when program execution begins.
c)False. The files will be closed when program execution terminates, but all files should be explicitly closed with fclose.
d)False. Function rewind can be used to reposition the file position pointer to the beginning of the file.
e)True.
f)False. In most cases, sequential file records are not of uniform length. Therefore, it is possible that updating a record will cause other data to be overwritten.
g)True.
h)False. Records in a random access file are normally of uniform length.
i)False. It is possible to seek from the beginning of the file, from the end of the file, and from the current location in the file according to the file position pointer.
Answers 23.3
a)ofPtr = fopen("oldmast.dat", "r");
b)tfPtr = fopen("trans.dat", "r");
c)nfPtr = fopen("newmast.dat", "w");
d)fscanf(ofPtr, "%d%s%f", &accountNum, name, ¤tBalance);
e)fscanf(tfPtr, "%d%f", &accountNum, &dollarAmount);
f)fprintf(nfPtr, "%d %s %.2f", accountNum, name, currentBalance);
Answers 23.4
a)Error: The file "payables.dat" has not been opened before the reference to its

file pointer.
Correction: Use fopen to open "payables.dat" for writing, appending, or updating.
b)Error: The function open is not an ANSI C function.
Correction: Use function fopen.
c)Error: The fscanf statement uses the incorrect file pointer to refer to file "payables.dat".
Correction: Use file pointer payPtr to refer to "payables.dat".
d)Error: The contents of the file are discarded because the file is opened for writing ("w").
Correction: To add data to the file, either open the file for updating ("r+") or open the file for appending ("a").
e)Error: File "courses.dat" is opened for updating in "w+" mode which discards the current contents of the file.
Correction: Open the file "a" mode.
Answers 23.5

/* Exercise 23.5 Solution */
/* NOTE: This program was run using the */

/* data in Exercise 23.8  */

#include <stdio.h>

#include <stdlib.h>


main()

{

   int masterAccount, transactionAccount;

   float masterBalance, transactionBalance;

   char masterName[23];

  FILE *ofPtr, *tfPtr, *nfPtr;

   
   if ((ofPtr = fopen("oldmast.dat", "r")) == NULL) {

      printf("Unable to open oldmast.dat\n");

      exit(1);

   }

   

   if ((tfPtr = fopen("trans.dat", "r")) == NULL) {

      printf("Unable to open trans.dat\n");

      exit(1);

   }

   if ((nfPtr = fopen("newmast.dat", "w")) == NULL) {

      printf("Unable to open newmast.dat\n");

      exit(1);
   }


   printf("Processing....\n");

   fscanf(tfPtr, "%d%f", &transactionAccount, &transactionBalance);


   while (!feof(tfPtr)) {

      fscanf(ofPtr, "%d%s%f", &masterAccount, masterName, &masterBalance);

      while (masterAccount < transactionAccount && 
!feof(ofPtr)) {

         fprintf(nfPtr, "%d %s %.2f\n", masterAccount, masterName,
masterBalance);

         printf("%d %s %.2f\n", masterAccount, masterName, masterBalance);

         fscanf(ofPtr, "%d%s%f", &masterAccount, masterName,

  &masterBalance);

      }

         

      if (masterAccount == transactionAccount) {

         masterBalance += transactionBalance;

         fprintf(nfPtr, "%d %s %.2f\n", masterAccount, masterName,
masterBalance);

         printf("%d %s %.2f\n", masterAccount, masterName, masterBalance);

      }

      else if (masterAccount > transactionAccount) {

         printf("Unmatched transaction record for account %d\n", 

                transactionAccount);

         fprintf(nfPtr, "%d %s %.2f\n", masterAccount, 
masterName,
masterBalance);

         printf("%d %s %.2f\n", masterAccount, masterName, masterBalance);
      }

      else

         printf("Unmatched transaction record for account %d\n", 

                transactionAccount);

        

      fscanf(tfPtr, "%d%f", &transactionAccount, &transactionBalance);

   
   while (!feof(ofPtr)) {

      fscanf(ofPtr, "%d%s%f", &masterAccount, masterName, &masterBalance);

      fprintf(nfPtr, "%d %s %.2f", masterAccount, masterName,

masterBalance);

      printf("%d %s %.2f", masterAccount, masterName, masterBalance);

   }

   

   fclose(ofPtr);

   fclose(tfPtr);

   fclose(nfPtr);

   return 0;

}


Processing....

100 Jones 375.31

300 Smith 89.30

Unmatched transaction record for account 400

500 Sharp 0.00

700 Green -14.22

Unmatched transaction record for account 900

Answers 23.6

/* Exercise 23.6 Solution */
#include <stdio.h>

#include <string.h>

#include <ctype.h>

#include <stdlib.h>


void initializeFile(FILE *);

void inputData(FILE *);

int instructions(void);

void listTools(FILE *);

void updateRecord(FILE *);

void insertRecord(FILE *);

void deleteRecord(FILE *);

struct hardwareData {

   int partNumber;

   char toolName[30];

  int inStock;

   float unitPrice;

};


main()

{

   FILE *filePtr;

   char response[2];

   int process;


   printf("Should the file be initialized (Y or N): ");

   scanf("%s", response);

      

   while (toupper(response[0]) != 'Y' && toupper(response[0]) != 'N') {

      printf("Invalid response. Enter Y or N: ");

      scanf("%s", response);

   }

      

   if (toupper(response[0]) == 'Y') {

      if ((filePtr = fopen("hardware.dat", "w")) == NULL) {

         printf("File could not be opened.\n");
         exit(1);

      }

         

      initializeFile(filePtr);

      inputData(filePtr);

      fclose(filePtr);

   }

            

   if ((filePtr = fopen("hardware.dat", "r+")) == NULL) {

      printf("File could not be opened.\n");

      exit(1);

   }
   while ((process = instructions()) != 5) {

      switch (process) {

         case 1:

            listTools(filePtr);

            break;

         case 2:

            updateRecord(filePtr);

            break;

         case 3:

            insertRecord(filePtr);

            break;

         case 4:
            deleteRecord(filePtr);

            break;            

      }

   }

   

   fclose(filePtr);

   return 0;

}


void initializeFile(FILE *fPtr)

   struct hardwareData blankItem = {0, "", 0, 0.0};
   int i;

   

   for (i = 0; i <= 99; i++) 

      fwrite(&blankItem, sizeof(struct hardwareData), 1, fPtr);

}


void inputData(FILE *fPtr)

{

   struct hardwareData temp;

   char c;

   
   printf("Enter the partnumber (0 - 99, -1 to end input): ");

   scanf("%d", &temp.partNumber);

   

   while (temp.partNumber != -1) {

      printf("Enter the tool name, quantity, and price:\n");

      scanf(" %[^0-9] %d%f", temp.toolName, &temp.inStock,

&temp.unitPrice);

      fseek(fPtr, temp.partNumber * sizeof(struct hardwareData),

SEEK_SET);

      fwrite(&temp, sizeof(struct hardwareData), 1, fPtr);


      printf("Enter the partnumber (0 - 99, -1 to end input): ");

      scanf("%d", &temp.partNumber);

   }

}


int instructions(void) 

{

   int choice;

   printf("\n%s\n%s\n%s\n%s\n%s\n%s\n? ", "Enter a choice:",

"1 List all tools.", "2 Update record.", "3 Insert record.",

          "4  Delete record.", "5  End program.");
   scanf("%d", &choice);

   

   while (choice < 1 || choice > 5) {

      printf("Invalid choice. Enter again: ");

      scanf("%d", &choice);

   }

   

   return choice;

}

void listTools(FILE *fPtr)

{
   struct hardwareData temp;


   fseek(fPtr, sizeof(struct hardwareData), SEEK_SET);

   printf("%8s  %-29s%10s%8s\n", "Record #", "Tool name", "Quantity",

"Cost");

          

   while (!feof(fPtr)) {

      fread(&temp, sizeof(struct hardwareData), 1, fPtr);


      if (temp.partNumber)

         printf("%-8d  %-29s%10d%8.2f\n", temp.partNumber,
temp.toolName, temp.inStock, temp.unitPrice);

   }
}



void updateRecord(FILE *fPtr)

{

   struct hardwareData temp;

   int part;

   

   printf("Enter the partnumber for update: ");

   scanf("%d", &part);

   fseek(fPtr, part * sizeof(struct hardwareData), SEEK_SET);
   fread(&temp, sizeof(struct hardwareData), 1, fPtr);

   

   if (temp.partNumber) {

      printf("%8s  %-29s%10s%8s\n", "Record #", "Tool name", "Quantity","Cost");

      printf("%-8d  %-29s%10d%8.2f\n", temp.partNumber, temp.toolName, 

             temp.inStock, temp.unitPrice);

      printf("Enter the tool name, quantity, and price:\n");

      scanf(" %[^0-9] %d%f", temp.toolName, &temp.inStock,

Price);

      fseek(fPtr, temp.partNumber * sizeof(struct 
hardwareData),
SEEK_SET);

      fwrite(&temp, sizeof(struct hardwareData), 1, fPtr);
   }

   else

      printf("Cannot update. The record is empty.\n");

}



void insertRecord(FILE *fPtr)

{

   struct hardwareData temp;

   int part;
   printf("Enter the partnumber for insertion: ");

   scanf("%d", &part);

   fseek(fPtr, part * sizeof(struct hardwareData), SEEK_SET);

   fread(&temp, sizeof(struct hardwareData), 1, fPtr);

   

   if (!temp.partNumber) {

      temp.partNumber = part;

      printf("Enter the tool name, quantity, and price:\n");

      scanf(" %[^0-9] %d%f", temp.toolName, &temp.inStock,

Price);

      fseek(fPtr, temp.partNumber * sizeof(struct 
hardwareData),
SEEK_SET);

      fwrite(&temp, sizeof(struct hardwareData), 1, fPtr);
   }

   else

      printf("Cannot insert. The record contains information.\n");

}

void deleteRecord(FILE *fPtr)

{

   struct hardwareData blankItem = {0, "", 0, 0.0}, temp;

   int part;

   
   printf("Enter the partnumber for deletion: ");

   scanf("%d", &part);

   fseek(fPtr, part * sizeof(struct hardwareData), SEEK_SET);

   fread(&temp, sizeof(struct hardwareData), 1, fPtr);

   

   if (temp.partNumber) {

      fseek(fPtr, part * sizeof(struct hardwareData), SEEK_SET);

      fwrite(&blankItem, sizeof(struct hardwareData), 1, fPtr);

      printf("Record deleted.\n");

   }
   else

      printf("Cannot delete. The record is empty.\n");

}

Answers 23.7

/* Exercise 23.7 Solution */
#include <stdio.h>


main()

{

   FILE *outPtr;

   

   outPtr = fopen("datasize.dat", "w");

   

   fprintf(outPtr, "%s%16s\n", "Data type", "Size");

   fprintf(outPtr, "%s%21d\n", "char", sizeof(char));

   fprintf(outPtr, "%s%12d\n", "unsigned char", 
sizeof(unsigned char));

   fprintf(outPtr, "%s%16d\n", "short int", sizeof(short int));
   fprintf(outPtr, "%s%7d\n", "unsigned short int",

sizeof(unsigned short int));

   fprintf(outPtr, "%s%22d\n", "int", sizeof(int));

   fprintf(outPtr, "%s%13d\n", "unsigned int", sizeof(unsigned int));

   fprintf(outPtr, "%s%17d\n", "long int", sizeof(long int));

   fprintf(outPtr, "%s%8d\n", "unsigned long int",

sizeof(unsigned long int));

   fprintf(outPtr, "%s%20d\n", "float", sizeof(float));

   fprintf(outPtr, "%s%19d\n", "double", sizeof(double));
   fprintf(outPtr, "%s%14d\n", "long double", sizeof(long double));

   

   fclose(outPtr);

   return 0;

}

Data type            Size

char                    1
unsigned char           1

short int               2

unsigned short int      2

int                     2

unsigned int            2

long int                4

unsigned long int       4

float                   4

double                 10

long double            10

Exercise 23.1
Fill in the blanks in each of the following:
a)Ultimately, all data items processed by a computer are reduced to combinations of ________ and ________.
b)The smallest data item a computer can process is called a ________.
c)A ________ is a group of related records.
d)Digits, letters, and special symbols are referred to as ________.
e)A group of related files is called a ________.
f)The ________ function closes a file.
g)The ________ statement reads data from a file in a manner similar to how scanf reads from stdin.
h)The ________ function reads a character from a specified file.ButtonName
i)The ________ function reads a line from a specified file.
j)The ________ function opens a file.
k)The ________ function is normally used when reading data from a file in random access applications.
l)The ________ function repositions the file position pointer to a specific location in the file.
Exercise 23.2
State which of the following are true and which are false (for those that are false, explain why):
a)Function fscanf cannot be used to read data from the standard input.
b)The programmer must explicitly use fopen to open the standard input, standard output, and standard error streams.
c)A program must explicitly call function fclose to close a file.
d)If the file position pointer points to a location in a sequential file other than the beginning of the file, the file must be closed and reopened to read from the beginning of the file.
e)Function fprintf can write to the standard output.
f)Data in sequential access files is always updated without overwriting other data.
g)It is not necessary to search through all the records in a randomly accessed file to find a specific record.
h)Records in randomly accessed files are not of uniform length.
i)Function fseek may only seek relative to the beginning of a file.
Exercise 23.3
Write a single statement to accomplish each of the following. Assume that each of these statements applies to the same program.
a)Write a statement that opens file "oldmast.dat" for reading and assigns the returned file pointer to ofPtr.
b)Write a statement that opens file "trans.dat" for reading and assigns the returned file pointer to tfPtr.
c)Write a statement that opens file "newmast.dat" for writing (and creation) and assigns the returned file pointer to nfPtr.
d)Write a statement that reads a record from the file "oldmast.dat". The record consists of integer accountNum, string name, and floating point currentBalance.
e)Write a statement that reads a record from the file "trans.dat". The record consists of integer accountNum and floating point dollarAmount.
f)Write a statement that writes a record to the file "newmast.dat". The record consists of integer accountNum, string name, and floating point currentBalance.
Exercise 23.4
Find the error in each of the following program segments. Explain how the error can be corrected.
a)The file referred to by fPtr ("payables.dat") has not been opened.
fprintf(fPtr, "%d%s%d\n", account, company, amount);
b)open("receive.dat", "r+");
c)The following statement should read a record from the file "payables.dat". File pointer payPtr refers to this file, and file pointer recPtr refers to the file "receive.dat".
fscanf(recPtr, "%d%s%d\n", &account, company, &amount);

d)The file "tools.dat" should be opened to add data to the file without discarding the current data.

if ((tfPtr = fopen("tools.dat", "w")) != NULL)

e)The file "courses.dat" should be opened for appending without modifying the current contents of the file.
if ((cfPtr = fopen("courses.dat", "w+")) != NULL)
Exercise 23.5
Exercise 11.3 asked the reader to write a series of single statements. Actually, these statements form the core of an important type of file processing program, namely, a file-matching program. In commercial data processing, it is common to have several files in each system. In an accounts receivable system, for example, there is generally a master file containing detailed information about each customer such as the customer's name, address, telephone number, outstanding balance, credit limit, discount terms, contract arrangements, and possibly a condensed history of recent purchases and cash payments.
As transactions occur (i.e., sales are made and cash payments arrive in the mail), they are entered into a file. At the end of each business period (i.e., a month for
some companies, a week for others, and a day in some cases) the file of transactions (called "trans.dat" in Exercise 11.3 is applied to the master file (called "oldmast.dat" in Exercise 11.3), thus updating each account's record of purchases and payments. After each of these updating runs, the master file is rewritten as a new file ("newmast.dat"), which is then used at the end of the next business period to begin the updating process again.
File-matching programs must deal with certain problems that do not exist in single-file programs. For example, a match does not always occur. A customer on the master file may not have made any purchases or cash payments in the current business period, and therefore no record for this customer will appear on the transaction file. Similarly, a customer who did make some purchases or cash
payments may have just moved to this community, and the company may not have had a chance to create a master record for this customer.
Use the statements written in Exercise 11.3 as a basis for writing a complete file- matching accounts receivable program. Use the account number on each file as the record key for matching purposes. Assume that each file is a sequential file with records stored in increasing account number order.
When a match occurs (i.e., records with the same account number appear on both the master file and the transaction file), add the dollar amount on the transaction file to the current balance on the master file, and write the "newmast.dat" record. (Assume that purchases are indicated by positive amounts on the transaction file, and that payments are indicated by negative amounts.) When there is a master record for a particular account but no
corresponding transaction record, merely write the master record to "newmast.dat". When there is a transaction record but no corresponding master record, print the message "Unmatched transaction record for account number ..." (fill in the account number from the transaction record).
Exercise 23.6
You are the owner of a hardware store and need to keep an inventory that can tell you what tools you have, how many you have, and the cost of each one. Write a program that initializes the file "hardware.dat" to 100 empty records, lets you input the data concerning each tool, enables you to list all your tools, lets you delete a record for a tool that you no longer have, and lets you update any information in the file. The tool identification number should be the record number. Use the following information to start your file:

Exercise 23.7
Write a program that uses the sizeof operator to determine the sizes in bytes of the various data types on your computer system. Write the results to the file "datasize.dat" so you may print the results later. The format for the results in the file should be as shown on the next screen.

Note: The type sizes on your computer may not be the same as the ones listed in the table.

* To be able to create, read, write, and update files. * To become familiar with sequential access file processing. * To become familiar with random access file processing.

Opening an existing file for writing ("w") when, in fact, the user wants to preserve the file; the contents of the file are discarded without warning.
Forgetting to open a file before attempting to reference it in a program.
Using the wrong file pointer to refer to a file.
Opening a nonexistent file for reading.
Opening a file for reading or writing without having been granted the appropriate access rights to the file (this is operating- system dependent).
Opening a file for writing when no disk space is available. Figure 23.6 lists the file open modes.
Opening a file with the incorrect file mode can lead to devastating errors. For example, opening a file in write mode ("w") when it should be opened in update mode ("r+") causes the contents of the file to be discarded.

23 File Processing

23.1Introduction
23.2The Data Hierarchy
23.3Files and Streams
23.4Creating a Sequential Access File
23.5Reading Data from a Sequential Access File
23.6Random Access Files
23.7Creating a Randomly Accessed File
23.8Writing Data Randomly to a Randomly Accessed File
23.9Reading Data Randomly from a Randomly Accessed File
23.10Case Study: A Transaction Processing Program
23.11Summary
Terminology
Figures

23.1 Introduction

Storage of data in variables and arrays is temporary; all such data is lost when a program terminates. Files are used for permanent retention of large amounts of data. Computers store files on secondary storage devices, especially disk storage devices. In this chapter, we explain how data files are created, updated, and processed by C programs. We consider both sequential access files and random access files.

23.2 The Data Hierarchy

Ultimately, all data items processed by a computer are reduced to combinations of zeros and ones. This occurs because it is simple and economical to build electronic devices that can assume two stable states--one of the states represents 0 and the other represents 1. It is remarkable that the impressive functions performed by computers involve only the most fundamental manipulations of 0s and 1s.
The smallest data item in a computer can assume the value 0 or the value 1. Such a data item is called a bit (short for "binary digit"--a digit that can assume one of
two values). Computer circuitry performs various simple bit manipulations such as determining the value of a bit, setting the value of a bit, and reversing a bit (from 1 to 0 or from 0 to 1).
It is cumbersome for programmers to work with data in the low-level form of bits. Instead, programmers prefer to work with data in the form of decimal digits (i.e., 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9), letters (i.e., A through Z, and a through z), and special symbols (i.e., $, @, %, &, *, (, ), -, +, ", :, ?, /, and many others). Digits, letters, and special symbols are referred to as characters. The set of all characters that may be used to write programs and represent data items on a particular computer is
called that computer's character set. Since computers can process only 1s and 0s, every character in a computer's character set is represented as a pattern of 1s and 0s (called a byte). Today, bytes are most commonly composed of eight bits. Programmers create programs and data items as characters; computers then manipulate and process these characters as patterns of bits.
Just as characters are composed of bits, fields are composed of characters. A field is a group of characters that conveys meaning. For example, a field consisting solely of uppercase and lowercase letters can be used to represent a person's name.
Data items processed by computers form a data hierarchy in which data items become larger and more complex in structure as we progress from bits, to characters (bytes), to fields, and so on.
A record (i.e., a struct in C) is composed of several fields. In a payroll system, for example, a record for a particular employee might consist of the following fields:
1. Social Security number
2. Name
3. Address
4. Hourly salary rate
5. Number of exemptions claimed
6. Year-to-date earnings
7. Amount of Federal taxes withheld, etc.
Thus, a record is a group of related fields. In the preceding example, each of the fields belongs to the same employee. Of course, a particular company may have many employees, and will have a payroll record for each employee. A file is a group of related records. A company's payroll file normally contains one record for each employee. Thus, a payroll file for a small company might contain only 22 records, whereas a payroll file for a large company might contain 100,000 records. It is not unusual for an organization to have hundreds or even thousands of files, with many
containing millions or even billions of characters of information. With the increasing popularity of laser optical disks and multimedia technology, even trillion- byte files will soon be common. Fig. 23.1 illustrates the data hierarchy.
To facilitate the retrieval of specific records from a file, at least one field in each record is chosen as a record key. A record key identifies a record as belonging to a particular person or entity. For example, in the payroll record described in this section, the Social Security number would normally be chosen as the record key.
There are many ways of organizing records in a file. The most popular type of organization is called a
sequential file in which records are typically stored in order by the record key field. In a payroll file, records are usually placed in order by Social Security number. The first employee record in the file contains the lowest Social Security number, and subsequent records contain increasingly higher Social Security numbers.
Most businesses utilize many different files to store data. For example, companies may have payroll files, accounts receivable files (listing money due from clients), accounts payable files (listing money due to suppliers), inventory files (listing facts about all the items handled by the business), and many other types of files. A group of related files is sometimes called a
database. A collection of programs designed to create and manage databases is called a database management system (DBMS).

23.3 Files and Streams

C views each file simply as a sequential stream of bytes (

Fig. 23.2). Each file ends either with an end-of-file marker or at a specific byte number recorded in a system maintained, administrative data structure. When a file is opened, a stream is associated with the file. Three files and their associated streams are automatically opened when program execution begins--the standard input, the standard output, and the standard error. Streams provide communication channels between files and programs. For example, the standard input stream enables a program to read data
from the keyboard, and the standard output stream enables a program to print data on the screen. Opening a file returns a pointer to a FILE structure (defined in <stdio.h>) that contains information used to process the file. This structure includes a file descriptor, i.e., an index into an operating system array called the open file table. Each array element contains a file control block (FCB) that the operating system uses to administer a particular file. The standard input, standard output, and standard error are manipulated using file pointers stdin, stdout, and stderr.
The standard library provides many functions for reading data from files and for writing data to files.
Function fgetc, like getchar, reads one character from a file. Function fgetc receives as an argument a FILE pointer for the file from which a character will be read. The call fgetc(stdin) reads one character from stdin-- the standard input. This call is equivalent to the call getchar(). Function fputc, like putchar, writes one character to a file. Function fputc receives as arguments a character to be written and a pointer for the file to which the character will be written. The function call fputc('a', stdout) writes the character 'a' to stdout--the standard output. This call is equivalent to putchar('a').
Several other functions used to read data from standard input and write data to standard output have similarly named file processing functions. The fgets and fputs functions, for example, can be used to read a line from a file and write a line to a file, respectively. Their counterparts for reading from standard input and writing to standard output are gets and puts. In the next several sections, we introduce the file processing equivalents of functions scanf and printf--fscanf and fprintf. Later in the chapter we discuss functions fread and fwrite.

23.4 Creating a Sequential Access File

C imposes no structure on a file. Thus, notions like a record of a file do not exist as part of the C language. Therefore, the programmer must provide any file structure to meet the requirements of each particular application. In the following example, we see how the programmer may impose a record structure on a file.
The program of

Fig. 23.3 creates a simple sequential access file that might be used in an accounts receivable system to help keep track of the amounts owed by a company's credit clients. For each client, the program obtains an account number, the client's name, and the
client's balance (i.e., the amount the client owes the company for goods and services received in the past). The data obtained for each client constitutes a "record" for that client. The account number is used as the record key in this application--the file will be created and maintained in account number order. This program assumes the user enters the records in account number order. In a comprehensive accounts receivable system, a sorting capability would be provided so the user could enter the records in any order. The records would then be sorted and written to the file.
Now let us examine this program. The statement

FILE *cfPtr;

states that cfptr is a pointer to a FILE structure. The C program administers each file with a separate FILE structure. The programmer need not know the specifics of the FILE structure to use files. We will soon see precisely how the FILE structure leads indirectly to the operating system's file control block (FCB) for a file.
Each open file must

have a separately declared pointer of type FILE that is used to refer to the file. The line

if ((cfPtr = fopen("clients.dat", "w")) == NULL)

names the file--"clients.dat"--to be used by the program and establishes a "line of communication" with the file. The file pointer cfPtr is assigned a pointer to the FILE structure for the file opened with fopen.
Function fopen takes two arguments: a file name and a file open mode. The file open mode "w" indicates that the file is to be opened for writing. If a file does not exist and it is opened for writing, fopen creates the file. If an existing file is opened for writing, the contents of the file are discarded without warning. In the program, the if

structure is used to determine whether the file pointer cfPtr is NULL (i.e., the file is not opened). If it is NULL, an

error message is printed and the program ends. Otherwise, the input is processed and written to the file.
The program prompts the user to enter the various fields for each record, or to enter end-of-file when data entry
is complete.

Figure 23.4 lists the key combinations for entering end-of-file for various computer systems.
The line

while (!feof(stdin))

uses function feof to determine whether the end-of-file indicator is set for the file to which stdin refers. The end-of-file indicator informs the program that there is no more data to be processed. In the program of Fig.

23.3, the end-of-file indicator is set for the standard input when the user enters the end-of-file key combination. The argument to function feof is a pointer to the file being tested for the end-of-file indicator (stdin in this case). The function returns a nonzero
value (true) once the end-of-file indicator has been set; otherwise, zero is returned. The while structure that includes the feof call in this program continues executing while the end-of-file indicator is not set.
The statement

fprintf(cfPtr, "%d %s %.2f\n", account,
        name,balance);

writes

data to the file clients.dat. The data may be retrieved later by a program designed to read the file (see Section 23.5). Function fprintf is

equivalent to printf except that fprintf also receives as an argument a file pointer for the file to which the data will be written.
After the user enters end-of-file, the program closes the clients.dat file with fclose and terminates. Function fclose also receives the

file pointer (rather than the file name) as an argument. If function fclose is not called explicitly, the operating system normally will close the file when program execution terminates. This is an example of

operating system "housekeeping."
In the sample execution for the program of

Fig. 23.3, the user enters information for five accounts, and then enters end-of-file to signal that data entry is complete. The sample execution does not show how the data records actually appear in the file. To verify that the file has been created successfully, in the next section we
present a program that reads the file and prints its contents.

Figure 23.5 illustrates the relationship between FILE pointers, FILE structures, and FCBs in memory. When the file "clients.dat" is opened, an FCB for the file is copied into memory. The figure shows the connection between the file pointer returned by fopen and the FCB used by the operating system to administer the file.
Programs may process no files, one file, or several files. Each file used in a program must have a unique name and will have a different file pointer returned by fopen. All subsequent file processing functions after the file is opened must refer to the file with the appropriate file
pointer. Files may be opened in one of several modes. To

create a file, or to discard the contents of a file before

writing data, open the file for writing ("w"). To read an

existing file, open it for reading ("r"). To add records to the end of an existing

file, open the file for appending ("a"). To open a file so that it may be written to and read from, open the file for updating in one of the three update modes--"r+", "w+", or a+". Mode "r+" opens a file for reading and writing. Mode "w+" creates a file for reading and writing. If the file already exists, the file is opened and the current contents of the file are discarded. Mode "a+" opens a
file for reading and writing--all writing is done at the end of the file. If the file does not exist, it is created.
If an

error occurs while opening a file in any mode, fopen returns NULL.

23.5 Reading Data from a Sequential Access File

Data are stored in files so that the data may be retrieved for processing when needed. The previous section demonstrated how to create a file for sequential access. In this section, we discuss how to read data sequentially from a file.
The program of

Fig. 23.7 reads records from the file "clients.dat" created by the program of

Fig. 23.3, and prints the contents of the records. The statement

FILE *cfPtr;

indicates that cfPtr is a pointer to a FILE. The line

if ((cfPtr = fopen("clients.dat", "r")) == NULL)

attempts to open the file "clients.dat" for reading ("r"), and determines whether the file is opened successfully (i.e., fopen does not return NULL). The statement

fscanf(cfPtr, "%d%s%f", &account, name, &balance);

reads a "record" from the file. Function fscanf is equivalent to function scanf except fscanf receives as an argument a file pointer for the file from which the data is read. After the preceding statement is executed the first time, account will have the value 100, name will have the value "Jones", and balance will have the
value 24.98. Each time the second fscanf statement is executed, another record is read from the file and account, name, and balance take on new values. When the end of the file is reached, the file is closed and the program terminates.
To retrieve data sequentially from a file, a program normally starts reading from the beginning of the file, and reads all data consecutively until the desired data are found. It may be desirable to process the data sequentially in a file several times (from the beginning of the file) during the execution of a program. A statement such as

rewind(cfPtr);

causes a program's file position pointer--indicating the number of the next byte in the file to be read or written--to be repositioned to the beginning of the file (i.e., byte 0) pointed to by cfPtr. The file position pointer is not really a pointer. Rather it is an integer value that specifies the byte location in the file at which the next read or write is to occur. This is sometimes referred to as the file offset. The file position pointer is a member of the FILE structure associated with each file.
We now present a program (

Fig. 23.8) that allows a credit manager to obtain lists of customers with zero balances (i.e., customers who do not owe any money), customers with credit balances (i.e., customers to whom the
company owes money), and customers with debit balances (i.e., customers who owe the company money for goods and services received). A credit balance is a negative amount; a debit balance is a positive amount.
The program displays a menu and allows the credit manager to enter one of three options to obtain credit information. Option 1 produces a list of accounts with zero balances. Option 2 produces a list of accounts with credit balances. Option 3 produces a list of accounts with debit balances. Option 4 terminates program execution. .
Note that data in this type of sequential file cannot be modified without the risk of destroying other data in the
file. For example, if the name "White" needed to be changed to "Worthington," the old name cannot simply be overwritten. The record for White was written to the file as

300 White 0.00

If the record is rewritten beginning at the same location in the file using the new name, the record would be

300 Worthington 0.00

The new record is larger than the original record. The characters beyond the second "o" in "Worthington" would overwrite the beginning of the next sequential
record in the file. The problem here is that in the formatted input/output model using fprintf and fscanf, fields--and hence records--can vary in size. For example, 7, 14, -117, 2074, and 27383 are all ints stored in the same number of bytes internally, but they print on the screen or fprintf on the disk as different- sized fields.
Therefore, sequential access with fprint and fscanf is not usually used to update records in place. Instead, the entire file is usually rewritten. To make the preceding name change, the records before 300 White 0.00 in such a sequential access file would be copied to a new file, the new record would be written, and the records
after 300 White 0.00 would be copied to the new file. This requires processing every record in the file to update one record.

23.6 Random Access Files

As we stated previously, records in a file created with the formatted output function fprintf are not necessarily the same length. However, individual records of a randomly accessed file are normally fixed in length and may be accessed directly (and thus quickly) without searching through other records. This makes randomly accessed files appropriate for airline reservation systems, banking systems, point-of-sale systems, and other kinds of transaction processing systems that require rapid access to specific data. There are other ways of implementing randomly accessed
files, but we will limit our discussion to this straight forward approach using fixed-length records.
Because every record in a randomly accessed file normally has the same length, the exact location of a record relative to the beginning of the file can be calculated as a function of the record key. We will soon see how this facilitates immediate access to specific records, even in large files.
Figure

23.10 illustrates one way to implement a randomly accessed file. Such a file is like a freight train with many cars--some empty and some with cargo. Each car in the train is the same length.
Data can be inserted in a randomly accessed file without destroying other data in the file. Data stored previously can also be updated or deleted without rewriting the entire file. In the following sections we explain how to create a randomly accessed file, enter data, read the data both sequentially and randomly, update the data, and delete data no longer needed.

23.7 Creating a Randomly Accessed File

Function fwrite transfers a specified number of bytes beginning at a specified location in memory to a file. The data is written beginning at the location in the file indicated by the file position pointer. Function fread transfers a specified number of bytes from the location in the file specified by the file position pointer to an area in memory beginning with a specified address. Now, when writing an integer, instead of using

fprintf(fPtr, "%d", number);

which could print as few as 1 digit or as many as 11 digits (10 digits plus a sign, each of which requires 1 byte of storage) for a 4-byte integer, we can use

fwrite(&number, sizeof(int), 1, fPtr);

which always writes 4 bytes (or 2 bytes on a system with 2-byte integers) from variable number to the file represented by fPtr (we will explain the 1 argument shortly). Later, fread can be used to read 4 of those bytes into integer variable number. Although fread and fwrite read and write data such as integers in fixed- size rather than variable-size format, the data they handle is processed in computer "raw data" format (i.e.,
bytes of data) rather than in printf's and scanf's human-readable format.
The fwrite and fread functions are capable of reading and writing arrays of data to and from disk. The third argument of both fread and fwrite is the number of elements in the array that should be read from disk or written to disk. The preceding fwrite function call writes a single integer to disk, so the third argument is 1 (as if one element of an array is being written).
File processing programs rarely write a single field to a file. Normally, they write one struct at a time, as we show in the following examples.
Consider the following problem statement:
Create a credit processing system capable of storing up to 100 fixed-length records. Each record should consist of an account number that will be used as the record key, a last name, a first name, and a balance. The resulting program should be able to update an account, insert a new account record, delete an account, and list all the account records in a formatted text file for printing. Use a randomly accessed file.
The next several sections introduce the techniques necessary to create the credit processing program. The program of

Fig. 23.11 shows how to open a randomly accessed file, define a record format using a struct, write data to the disk, and close the file. This program
initializes all 100 records of the file "credit.dat" with empty structs using function fwrite. Each empty struct contains 0 for the account number, NULL (represented by empty quotation marks) for the last name, NULL for the first name, and 0.0 for the balance. The file is initialized in this manner to create space on the disk in which the file will be stored, and to make it possible to determine if a record contains data.
Function fwrite writes a block (specific number of bytes) of data to a file. In our program, the statement

fwrite(&blankClient, sizeof(struct clientData),
     1, cfPtr);

causes the structure blankClient of size sizeof(struct clientData) to be written to the file pointed to by cfPtr. The operator sizeof returns the size in bytes of the object contained in

parentheses (in this case struct clientData). The sizeof operator is a compile-time unary operator that returns an unsigned integer. The sizeof operator can be used to determine the size in bytes of any data type or expression. For example, sizeof(int) is used to determine whether an integer is stored in 2 or 4 bytes on a particular computer.
Function fwrite can actually be used to write several elements of an array of objects. To write several array elements, the programmer supplies a pointer to an array
as the first argument in the call to fwrite, and specifies the number of elements to be written as the third argument in the call to fwrite. In the preceding statement, fwrite was used to write a single object that was not an array element. Writing a single object is equivalent to writing one element of an array, hence the 1 in the fwrite call.

23.8 Writing Data Randomly to a Randomly Accessed File

The program of

Fig. 23.12 writes data to the file "credit.dat". It uses the combination of fseek and fwrite to store data at specific locations in the file. Function fseek sets the file position pointer to a specific position in the file, then fwrite writes the data.
The statement

fseek(cfPtr, (accountNum - 1) 
   * sizeof(struct clientData), SEEK_SET);

positions the file position pointer for the file referenced by cfPtr to the byte location calculated by
(accountNum - 1) * sizeof(struct clientData); the value of this expression is called the offset or the displacement. Because the account number is between 1 and 100 but the byte positions in the file start with 0, 1 is subtracted from the account number when calculating the byte location of the record. Thus, for record 1, the file position pointer is set to byte 0 of the file. The symbolic constant SEEK_SET indicates that the file position pointer is positioned relative to the beginning of the file by the amount of the offset. As the above statement indicates, a seek for account number 1 in the file sets the file position pointer to the beginning of the file because the byte location calculated is 0.

Figure 23.14
illustrates the file pointer referring to a FILE structure in memory. The file position pointer indicates that the next byte to be read or written is 5 bytes from the beginning of the file.
The ANSI standard shows the function prototype for fseek as

int fseek(FILE *stream, long int offset, int whence);

where offset is the number of bytes from location whence in the file pointed to by stream. The argument whence can have one of three values--SEEK_SET, SEEK_CUR or SEEK_END--indicating the location in the file from which the seek begins. SEEK_SET
indicates that the seek starts at the beginning of the file; SEEK_CUR indicates that the seek starts at the current location in the file; and SEEK_END indicates that the seek starts at the end of the file. These three symbolic constants are defined in the stdio.h header file.

23.9 Reading Data Randomly from a Randomly Accessed File

Function fread reads a specified number of bytes from a file into memory. For example, the statement

fread(&client, sizeof(struct clientData), 1, cfPtr);

reads the number of bytes determined by sizeof(struct clientData) from the file referenced by cfPtr and stores the data in the structure client. The bytes are read from the location in the file specified by the file position pointer. Function fread can be used to read several fixed-size array elements by providing a pointer to the
array in which the elements will be stored, and by indicating the number of elements to be read. The preceding statement specifies that one element should be read. To read more than one element, specify the number of elements in the third argument of the fread statement.
The program of

Fig. 23.15 reads sequentially every record in the "credit.dat" file, determines whether each record contains data, and prints the formatted data for records containing data. The feof function determines when the end of the file is reached, and the fread function transfers data from the disk to the clientData structure client.

23.10 Case Study: A Transaction

Processing Program

We now present a substantial transaction processing program using randomly accessed files. The program maintains a bank's account information. The program updates existing accounts, adds new accounts, deletes accounts, and stores a listing of all the current accounts in a text file for printing. We assume that the program of

Fig. 23.11 has been executed to create the file credit.dat.
The program has five options. Option 1 calls function textFile to store a formatted list of all the accounts in a
text file called accounts.txt that may be printed later. The function uses fread and the sequential file access techniques used in the program of Fig. 11.15. After choosing option 1 the file accounts.txt contains:

Acct Last Name First Name Balance

29 Brown Nancy -24.54

33 Dunn Stacey 314.33

37 Barker Doug 0.00

88 Smith Dave 258.34

96 Stone Sam 34.98

Option 2 calls the function updateRecord to update an account. The function will only update a record that already exists, so the function first checks to see if the
record specified by the user is empty. The record is read into structure client with fread, then member acctNum is compared to 0. If it is 0, the record contains no information, and a message is printed stating that the record is empty. Then, the menu choices are displayed. If the record contains information, function updateRecord inputs the transaction amount, calculates the new balance, and rewrites the record to the file. A typical output for option 2 is:


Enter account to update (1 - 100): 37

37    Barker          Doug             0.00


Enter charge (+) or payment (-): +87.99

37 Barker Doug 87.99

Option 3 calls the function newRecord to add a new account to the file. If the user enters an account number for an existing account, newRecord displays an error message that the record already contains information, and the menu choices are printed again. This function uses the same process to add a new account as does the program in

Fig. 23.12. A typical output for option 3 is

Enter new account number (1 - 100): 22

Enter lastname, firstname, balance

? Johnston Sarah 247.45

Option 4 calls function deleteRecord to delete a record from the file. Deletion is accomplished by asking the user for the account number and reinitializing the record. If the account contains no information, deleteRecord displays an error message that the account does not exist. Option 5 terminates program execution. The program is shown in Fig. 23.16. Note that the file "credit.dat" is opened for update (reading and writing) using "r+" mode.

23.11 Summary

* All data items processed by a computer are reduced to combinations of zeros and ones. * The smallest data item in a computer can assume the value 0 or the value 1. Such a data item is called a bit (short for "binary digit"--a digit that can assume one of two values). * Digits, letters, and special symbols are referred to as characters. The set of all characters that may be used to write programs and represent data items on a particular computer is called that computer's character set. Every character in the computer's character set is represented * as a pattern of eight 1s and 0s (called a byte). * A field is a group of characters that conveys meaning. * A record is a group of related fields. * At least one field in each record is normally chosen as a record key. The record key identifies a record as belonging to a particular person or entity. * The most popular type of organization for records in a file is called a sequential access file in which records are accessed consecutively until the desired data are located. * A group of related files is sometimes called a database. A collection of programs designed to create and * manage databases is called a database management system (DBMS). * C views each file simply as a sequential stream of bytes. * C automatically opens three files and their associated streams--standard input, standard output, and standard error--when program execution begins. * The file pointers assigned to the standard input, standard output, and standard error are stdin, stdout, and stderr, respectively. * Function fgetc reads a character from a specified file. * Function fputc writes a character to a specified file. * Function fgets reads a line from a specified file. * Function fputs writes a line to a specified file. * FILE is a structure type defined in the stdio.h header file. The programmer need not know the specifics of this structure to use files. As a file is opened, a pointer to the file's FILE structure is returned. * Function fopen takes two arguments--a file name and a file open mode--and opens the file. If the file exists, the contents of the file are discarded without warning. If the file does not exist and the file is being opened for writing, fopen creates the file. * Function feof determines whether the end-of-file indicator for a file has been set. * Function fprintf is equivalent to printf except * fprintf receives as an argument a pointer to the file to which the data will be written. * Function fclose closes the file pointed to by its argument. * To create a file, or to discard the contents of a file before writing data, open the file for writing ("w"). To read an existing file, open it for reading ("r"). To add records to the end of an existing file, open the file for appending ("a"). To open a file so that it may be written to and read from, open the file for updating in one of the three updating modes--"r+", "w+", or "a+". Mode "r+" simply opens the file for reading and writing. Mode "w+" creates the file if it does not exist, and * discards the current contents of the file if it does exist. Mode "a+" creates the file if it does not exist, and writing is done at the end of the file. * Function fscanf is equivalent to scanf except fscanf receives as an argument a pointer to the file (normally other than stdin) from which the data will be read. * Function rewind causes the program to reposition the file position pointer for the specified file to the beginning of the file. * Random access file processing is used to access a record directly. * To facilitate random access, data is stored in fixed- length records. Since every record is the same length, * the computer can quickly calculate (as a function of the record key) the exact location of a record in relation to the beginning of the file. * Data can be added easily to a random access file without destroying other data in the file. Data stored previously in a file with fixed-length records can also be changed and deleted without rewriting the entire file. * Function fwrite writes a block (specific number of bytes) of data to a file. * The compile-time operator sizeof returns the size in bytes of its operand. * Function fseek sets the file position pointer to a specific position in a file based on the starting location of * the seek in the file. The seek can start from one of three locations--SEEK_SET starts from the beginning of the file, SEEK_CUR starts from the current position in the file, and SEEK_END starts from the end of the file. * Function fread reads a block (specific number of bytes) of data from a file.

This chapter does not contain any Testing and Debugging tips.

This chapter does not contain any Applet Examples.

This chapter does not contain any Software Engineering Observations.